Fault-Tolerant Middleware and the Magical 1%

نویسندگان

  • Tudor Dumitras
  • Priya Narasimhan
چکیده

Through an extensive experimental analysis of over 900 possible configurations of a fault-tolerant middleware system, we present empirical evidence that the unpredictability inherent in such systems arises from merely 1% of the remote invocations. The occurrence of very high latencies cannot be regulated through parameters such as the number of clients, the replication style and degree or the request rates. However, by selectively filtering out a "magical 1%" of the raw observations of various metrics, we show that performance, in terms of measured end-to-end latency and throughput, can be bounded, easy to understand and control. This simple statistical technique enables us to guarantee, with some level of confidence, bounds for percentile-based quality of service (QoS) metrics, which dramatically increase our ability to tune and control a middleware system in a predictable manner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Middleware for Constructing Highly Available, Fault Tolerant, and Attack Tolerant Services

This paper describes the design of a middleware that provides support for constructing highly available, secure, fault-tolerant, and attack-tolerant services. The central component of this middleware is a group communication service that comprises of six network protocols: atomic broadcast, group membership, failure detection, attack detection, group access control, and secure intermember commu...

متن کامل

Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems

Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to ...

متن کامل

Fault Tolerant Middleware for Agent Systems: A Refinement Approach

Agent technology offers a number of advantages over traditional distributed systems, such as asynchronous communication, anonymity of individual agents and ability to change operational context. However, it is notoriously difficult to ensure dependability of agent systems. In this paper we present a formal approach for the top-down development of fault tolerant middleware for agent systems. We ...

متن کامل

A study of unpredictability in fault-tolerant middleware

In enterprise applications relying on fault-tolerant middleware, it is a common engineering practice to establish service-level agreements (SLAs) based on the 95th or the 99th percentiles of the latency, to allow a margin for unexpected variability. However, the extent of this unpredictability has not been studied systematically. We present an extensive empirical study of unpredictability in 16...

متن کامل

Mission Statement: ToleranceZone A Self-Stabilizing Middleware for Wireless Sensor Netzworks

Wireless sensor networks (WSN) can be used in a wide range of monitoring and controlling applications. These networks consist of nodes with sparse resources, which makes application implementation challenging. Therefore, many middleware systems were developed in the last decade. Furthermore, unattended and long-living deployments of WSNs need fault-tolerant software architectures. The goal of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005